Multi data path config can cause a shard to be perceived as corrupted #4674

kimchy · 2014-01-09T21:36:27Z

Multi data path config support writes a file to a data location based on the available size (by default). There is a Lucene file called segments.gen that has the same name, and only in that case, we need to make sure we alway write it to the same data location, otherwise, the index will have multiple segments.gen files, and the shard can seem to be corrupted.

The message if this case happens is that segments_xxx file was not found, in which case, a find for segments.gen can yield multiple files. Deleting the segments.gen files will cause the shard to recover properly (as its an extra protection layer to resolve the segments header by Lucene)

The text was updated successfully, but these errors were encountered:

Multi data path config support writes a file to a data location based on the available size (by default). There is a Lucene file called segments.gen that has the same name, and only in that case, we need to make sure we alway write it to the same data location, otherwise, the index will have multiple segments.gen files, and the shard can seem to be corrupted. The message if this case happens is that segments_xxx file was not found, in which case, a find for segments.gen can yield multiple files. Deleting the segments.gen files will cause the shard to recover properly (as its an extra protection layer to resolve the segments header by Lucene) Make sure the segments.gen file is writtne to the same directory every time fixes elastic#4674

Multi data path config support writes a file to a data location based on the available size (by default). There is a Lucene file called segments.gen that has the same name, and only in that case, we need to make sure we alway write it to the same data location, otherwise, the index will have multiple segments.gen files, and the shard can seem to be corrupted. The message if this case happens is that segments_xxx file was not found, in which case, a find for segments.gen can yield multiple files. Deleting the segments.gen files will cause the shard to recover properly (as its an extra protection layer to resolve the segments header by Lucene) Make sure the segments.gen file is writtne to the same directory every time fixes #4674

sbarton · 2014-01-15T18:05:41Z

Hi,

I was using ES 0.90.7 and my shards were after restart of the cluster going missing. Physically, I can see the files (in 6 cases out of 8) still on the disk, but the shards won't come up with the indication of a segments_X file missing. I have followed your advice of removing the segments.gen file in order to recover, but the shards are not coming up. Even after restart of the cluster, the shards are not coming up, the error is now:

[2014-01-15 18:46:00,504][DEBUG][cluster.service ] [Synch] processing [shard-failed ([1millionnewv2][4], node[fWigetX5QNar2zIpvkEK_Q], [P], s[INITIALIZING]), reas
on [Failed to start shard, message [IndexShardGatewayRecoveryException[[1millionnewv2][4] failed to fetch index version after copying it over]; nested: IndexShardGatewayRe
coveryException[[1millionnewv2][4] shard allocated for local recovery (post api), should exist, but doesn't, current files: [_s41.nvd, _twd_es090_0.tip, _um4.nvm,... ]]; nested: IndexNotF
oundException[no segments* file found in store(least_used[rate_limited(niofs(/0/elasticsearch/elasticsearch/nodes/0/indices/1millionnewv2/4/index), type=MERGE, rate=20.0),
rate_limited(niofs(/1/elasticsearch/elasticsearch/nodes/0/indices/1millionnewv2/4/index), type=MERGE, rate=20.0), rate_limited(niofs(/2/elasticsearch/elasticsearch/nodes/
0/indices/1millionnewv2/4/index), type=MERGE, rate=20.0), rate_limited(niofs(/3/elasticsearch/elasticsearch/nodes/0/indices/1millionnewv2/4/index), type=MERGE, rate=20.0)]
): files: [ .... ] ]]]: no change in cluster_state

What can I do if the usual strategy of the segments.gen file removal doesn't work? I have even updated the ES to the latest 0.90.10 version but the restart problems are still the same and the shards are not coming up. I have lost a considerable amount of data, in those 2 shards that were wiped out, but I could still save a lot of data in the 6 shards I am could recover.

Multi data path config support writes a file to a data location based on the available size (by default). There is a Lucene file called segments.gen that has the same name, and only in that case, we need to make sure we alway write it to the same data location, otherwise, the index will have multiple segments.gen files, and the shard can seem to be corrupted. The message if this case happens is that segments_xxx file was not found, in which case, a find for segments.gen can yield multiple files. Deleting the segments.gen files will cause the shard to recover properly (as its an extra protection layer to resolve the segments header by Lucene) Make sure the segments.gen file is writtne to the same directory every time fixes elastic#4674

kimchy · 2014-01-20T15:57:20Z

@sbarton are you sure you deleted the segments.gen from all data directories for the relevant shard ([1millionnewv2][4]) (or, better yet, just delete it recursively across all data locations)? the failure suggests you potentially didn't.

sbarton · 2014-02-06T09:35:48Z

@kimchy I have made sure I removed the segments.gen from all data directories (using find command) but still no luck. I at the end gave up (I had more shards in the same situation on other machines - tried the same approach but none of them came back) and re-indexed the whole thing once again. But I can say, that the 0.90.10 version is not loosing shards even after several harsh restarts.

Multi data path config support writes a file to a data location based on the available size (by default). There is a Lucene file called segments.gen that has the same name, and only in that case, we need to make sure we alway write it to the same data location, otherwise, the index will have multiple segments.gen files, and the shard can seem to be corrupted. The message if this case happens is that segments_xxx file was not found, in which case, a find for segments.gen can yield multiple files. Deleting the segments.gen files will cause the shard to recover properly (as its an extra protection layer to resolve the segments header by Lucene) Make sure the segments.gen file is writtne to the same directory every time fixes elastic#4674

kimchy mentioned this issue Jan 9, 2014

Multi data path config can cause a shard to be perceived as corrupted #4676

Closed

kimchy closed this as completed in da680be Jan 9, 2014

kimchy mentioned this issue Jan 19, 2014

IndexShardGatewayRecoveryException: [<index name>][4] failed to fetch index version after copying it over #4798

Closed

nedcampion mentioned this issue Jun 18, 2014

Unassigned shards: Failed to start shard #6555

Closed

iTonych mentioned this issue Sep 4, 2015

ES shards red status after old indices removing #13347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi data path config can cause a shard to be perceived as corrupted #4674

Multi data path config can cause a shard to be perceived as corrupted #4674

kimchy commented Jan 9, 2014

sbarton commented Jan 15, 2014

kimchy commented Jan 20, 2014

sbarton commented Feb 6, 2014

Multi data path config can cause a shard to be perceived as corrupted #4674

Multi data path config can cause a shard to be perceived as corrupted #4674

Comments

kimchy commented Jan 9, 2014

sbarton commented Jan 15, 2014

kimchy commented Jan 20, 2014

sbarton commented Feb 6, 2014